-
Notifications
You must be signed in to change notification settings - Fork 94
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
scx_lavd: replesih time slice at ops.running() only when necessary #250
Conversation
The current code replenishes the task's time slice whenever the task becomes ops.running(). However, there is a case where such behavior can starve the other tasks, causing the watchdog timeout error. One (if not all) such case is when a task is preempted while running by the higher scheduler class (e.g., RT, DL). In such a case, the task will be transit in a cycle of ops.running() -> ops.stopping() -> ops.running() -> etc. Whenever it becomes re-running, it will be placed at the head of local DSQ and ops.running() will renew its time slice. Hence, in the worst case, the task can run forever since its time slice is never exhausted. The fix is assigning the time slice only once by checking if the time slice is calculated before. Suggested-by: Tejun Heo <[email protected]> Signed-off-by: Changwoo Min <[email protected]>
@@ -49,15 +49,16 @@ enum consts { | |||
NSEC_PER_USEC = 1000ULL, | |||
NSEC_PER_MSEC = (1000ULL * NSEC_PER_USEC), | |||
LAVD_TIME_ONE_SEC = (1000ULL * NSEC_PER_MSEC), | |||
LAVD_TIME_INFINITY_NS = 0xFFFFFFFFFFFFFFFFULL, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You should be able to sue SCX_SLICE_INF
which is the same u64 max value. Note that if a running task has this slice value, the tick is stopped. I don't think lavd ever actually ends up running tasks with this value tho, so not really a concern but just something to note.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank! I will update it accordingly.
* the task can run forever. | ||
*/ | ||
return p->scx.slice == LAVD_SLICE_UNDECIDED || | ||
p->scx.slice == SCX_SLICE_DFL; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So, the only time the kernel assigns SCX_SLICE_DFL
is when the the currently running task is the only runnable task on the CPU. When the task's slice expires, the kernel sets slice to the default value and keeps running the task. This is a convenience feature which can be disabled by setting SCX_OPS_ENQ_LAST
in ops.flags
. When the flag is set, the task will always be enqueued when the slice expires whether it's the last runnable task on the CPU or not. When the last task is enqueued, ops.enqueue()
is called with SCX_ENQ_LAST
flag:
/*
* The task being enqueued is the only task available for the cpu. By
* default, ext core keeps executing such tasks but when
* %SCX_OPS_ENQ_LAST is specified, they're ops.enqueue()'d with the
* %SCX_ENQ_LAST flag set.
*
* If the BPF scheduler wants to continue executing the task,
* ops.enqueue() should dispatch the task to %SCX_DSQ_LOCAL immediately.
* If the task gets queued on a different dsq or the BPF side, the BPF
* scheduler is responsible for triggering a follow-up scheduling event.
* Otherwise, Execution may stall.
*/
SCX_ENQ_LAST = 1LLU << 41,
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For now, I will keep the code as it is but later when the preemption code per tick is ready, I will change it.
Thank you!
As we haven't cut releases for a while, imma land this now, bump versions and cut a new one. |
The current code replenishes the task's time slice whenever the task becomes ops.running(). However, there is a case where such behavior can starve the other tasks, causing the watchdog timeout error. One (if not all) such case is when a task is preempted while running by the higher scheduler class (e.g., RT, DL). In such a case, the task will be transit in a cycle of ops.running() -> ops.stopping() -> ops.running() -> etc. Whenever it becomes re-running, it will be placed at the head of local DSQ and ops.running() will renew its time slice. Hence, in the worst case, the task can run forever since its time slice is never exhausted. The fix is assigning the time slice only once by checking if the time slice is calculated before.
Suggested-by: Tejun Heo [email protected]